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Abstract. The United States Code (Code) is an important source of Federal law that is produced by the 
interactions of many heterogeneous actors in a complex, dynamic space. The Code can be represented as 
the union of a hierarchical network and a citation network over the vertices representing the language of 
the Code. In this paper, we investigate the properties of the Code's citation network by examining the 
directed degree distributions of the network. We find that the power-law model is a plausible fit for the 
outdegree distribution but not for the indegree distribution. In order to better understand this result, we 
construct a model with the assumption that the probability of citation is a per-word rate. We calculate 
the adjusted degree of each vertex under this model and study the directed adjusted degree distributions. 
These adjusted degree distributions indicate that both the adjusted indegree and outdegree distributions 
seems to follow a log-normal form, not a power-law form. Our findings indicate that the power-law is not 
generally applicable to degree distributions within the United States Code but that the distribution of 
degree per word is well-described by a log-normal model. 



1 Introduction 

Many real systems are characterized by the interactions 
of heterogeneous actors operating in a complex, dynamic 
space. One important example of such a system is the 
legislative system of the United States. The actors in this 
system include members of Congress and the Federal Gov- 
ernment, as well as the wide range of individuals, organi- 
zations, and even countries who act to further their own 
policy goals. Furthermore, the space of all possible pol- 
icy configurations is combinatorially complex and nuanced 
beyond formalization. One feasible approach to studying 
this legislative system is to examine a representation of 
output - the United States Code (Code). 

The Code is a 22 million word compilation of legis- 
lation and treaties that have been ratified by CongressF] 



This compilation is performed through a process known as 
codification, which is carried out by the Office of the Law Re- 
vision Counsel (LRC), an organization within the U.S. House 
of Representatives. 2 U.S.C. §285- §285g outlines the purpose, 
policy and functions of the Office of Law Revision Counsel. 
The LRC's goal in this codification process is to transform 
the incremental and chronological Statutes at Large into the 
Code, a current snapshot of the law that is organized into hi- 
erarchical categories. The Code is only prima facie evidence 
of Federal law. In the event of a discrepancy, the Statutes at 
Large are the final authority. Furthermore, additional sources 
such as the Code of Federal Regulations contains clarifications 
issued by other Federal agencies or bodies. 



This compilation can be mathematically represented as 
the union of a hierarchical network and a citation net- 
work over the vertices representing the language of the 
Code (PJ). The purpose of this paper is to describe the 
properties of this citation network and explore possible 
explanatory dynamics. 

In the remainder of the paper, we first describe the 
dataset and provide summary statistics of the Code's cita- 
tion network in Section [2] In Section [3j we assess whether 
the power-law is a plausible model for the directed degree 
distributions. In Section |4j we consider an alternative hy- 
pothesis that the governing dynamic is instead "degree 
per word." We then test both a power-law and log-normal 
model for this adjusted "degree per word" data. Finally, 
we conclude with a summary of results in Section [5j 



2 Data 

In order to construct a mathematical representation Q = 
(V, E) of the Code we have obtained an XML snapshot of 
the Code from March 2010. This data was provided by the 
Legal Information Institute at the Cornell University Law 
School ([6]). Citations are explicitly coded within these 
XML documents at the section level. Sections in the Code 
are similar in purpose to sections within articles, such as 
this "Data" section. Within the Code, they are unique 
among possible units of analysis in that they are guar- 
anteed to contain complete grammatically parsable legal 
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text. Therefore, they make a natural unit of analysis for 
evaluating linguistic dynamics ([!]). 

The Code has 59,988 sections. However, since some 
sections are isolates, the largest weakly connected compo- 
nent of the section citation network has only \V\ = 34, 674 
vertices. The total number of edges in the network is 
\E\ = 85,921. These figures imply a density of 7.15xl0" 5 , 
indicating that the network is relatively sparse. 

3 Degree 

When presented with a network, the first step is often 
to study its degree distribution. Characterizations of the 
degree distributions are both useful in explaining macro- 
scopic properties of the network and straightforward to 
calculate (0). 

At first glance, the indegree and outdegree distribu- 
tions appear highly skewed, with skewness values of 15.38 
and 6.07 respectively. When such skewed degree distri- 
butions are encountered, power-law distributions are of- 
ten present. To test for the existence of the power-law 
phenomenon in this network, we apply the methods of 
Clauset et al. ([!]) and assess the scaling parameter a and 
p- value of each fit. These methods are designed to deter- 
mine whether the data follow the form 

p(S) cx<r a (i) 

where S is degree and a is the scaling exponent, p- values 
are obtained by the software provided in [T|. 



Section 


<T 


5+ 


26 U.S.C. 501 


261 


42 


5 U.S.C. 552 


231 


4 


6 U.S.C. 1 


227 


1 


6 U.S.C. 3 


226 


4 


16 U.S.C. 2 


223 


2 



Table 1. Five Highest Indegree Sections 



Section 


<r 


6+ 


18 U.S.C. 2516 


4 


119 


18 U.S.C. 1956 


28 


80 


42 U.S.C. 1396a 


63 


76 


15 U.S.C. 78c 


73 


68 


18 U.S.C. 1841 





64 



Table 2. Five Highest Outdegree Sections 

Figure [T] shows the log-log indegree distribution and 
its power-law fit as calculated by pQ. a and x m i n are 2.68 
and 10 respectively. Though a falls between 2 and 3, the 
p-value for this model is 0.084 and the region of fit spans 
only one order of magnitude. It thus seems fairly implau- 
sible that the data come from a power-law process. This 
deviation is most likely due to the steep shift in slope 
at the tail of the data. Table [T] lists the sections in this 
tail. These sections corresponds to some of the most well- 
known and broadly applicable portions of Federal statu- 
tory law. For example, the most cited section, 26 U.S.C. 
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Fig. 1. Log-Log Indegree Distribution with Power-Law 
Fit. 
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Fig. 2. Log-Log Outdegree Distribution with Power-Law 
Fit. 

§501, contains §501(c)(3), the clause defining the condi- 
tions for tax-exempt organizations. Provisions throughout 
the Code rely on this definition. 

Figure [2] plots the log-log outdegree distribution and 
its power-law fit. a and x m i n are 3.5 and 15 respectively. 
In this case, the p- value is 0.58 and it thus seems plausible 
that the data is generated by a power-law. However, the 
region of fit is only one order of magnitude and the scaling 
parameter a is greater than 3. This implies that these data 
are an extreme case of the power-law. The value of a = 3.5 
is greater than all but 3 of the 17 identified power laws in 

Just as in the indegree distribution, much of the devi- 
ation is produced in the steep slope at the end of the tail. 
Table [2] lists the sections that contribute most to this tail. 
Many of these sections deal with investigating criminal 
conduct and thus list the activities which allow for the in- 
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volvement of Federal law enforcement. Though the degree 
distributions above exhibit significant skewness, these re- 
sults indicate that degree distributions of sections cannot 
simply be modeled as power-law processes. 
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4 Adjusted Degree 

In Section [3] above, we model the sections of the Code as 
equal in probability of both citing and being cited. In re- 
ality, each section contains a varying amount of language. 
Thus, a more appropriate model is to assume that the 
probabilities of both citing and being cited are propor- 
tional to the amount of language contained in each section. 
This assumption is equivalent to the following functional 
form: 



(2) 



where 8 and L are the degree and amount of language of 
a section. Given that the number of words per section can 
vary by orders of magnitude within the Code, this seems 
like a much more plausible model. 

In order to evaluate this model, we need to define what 
the "amount of language" L means. The most natural unit 
of analysis here is a single "token," as each token typically 
corresponds to a single word. In order to count these to- 
kens in English text, we tokenize each section according to 
the Pcnn Treebank and let L be the number of resulting 
strings (0). This allows us to calculate for each section 
of the Code. 

Figure [3] shows the log-log adjusted indegree distribu- 
tion and its power-law fit. a and x m i n are 2.3 and 0.026 
respectively. Though a lies between 2 and 3, the p-value 
is 0.003, soundly rejecting the model of a power-law gen- 
erating process. This rejection is due to the significant 
convexity displayed for small values of the distribution. 

Figure [4] plots the log-log adjusted outdegree distribu- 
tion and its power-law fit. a and x m i n are 3.2 and 0.023 
respectively. In this case, p ~ and thus the power-law 
model is once again incredibly implausible. In this case, 
the convexity of the log-log distribution is even more pro- 
nounced. 

These calculations indicate that the adjusted degree 
distributions do not seem to follow a power-law form. They 
do, however, appear to approximately obey a log-normal 
distribution of the form: 



P ( z )oc-exp(- 



(Hi) 



) 



(3) 



where \i and a are the usual mean and standard deviation. 

Figure [5] displays the distribution of log adjusted inde- 
gree and the corresponding normal fit with \i = —5.09 and 
a = 1.49. These estimates are significant at the p = 0.05 
level and have standard errors on the order of 10~ 3 . The 
figure indicates that this distribution is a good fit, though 
the slight positive skewness results in asymmetric tails. 
This confirms our model that the logarithm of the rate of 
citations received per word is normally distributed. 
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Fig. 3. Log-Log Adjusted Indegree Distribution with 
Power-Law Fit. 
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Fig. 4. Log-Log Adjusted Outdegree Distribution with 
Power-Law Fit. 



Figure [6] plots the distribution of log adjusted outde- 
gree and the corresponding normal fit with ji — —5.10 and 
a = 1.01. These estimates are significant at the p — 0.05 
level and have standard errors on the order of 10~ 3 . The 
figure shows that this distribution is an excellent fit with 
nearly symmetric tails. This confirms our model that the 
logarithm of the rate of citations made per word is nor- 
mally distributed. These adjusted degree results also im- 
ply that the model of log-normal rate of citation per word 
is a good fit to the data. This is an interesting result that 
contradicts the conclusions of citation analyses in aca- 
demic journals ([!]). 
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Fig. 5. Log Adjusted Indegree Distribution with 
Af(- 5.09, 1.49) Fit. 
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Fig. 6. Log Adjusted Outdegree Distribution with 
A/"(-5.10, 1.01) Fit. 
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5 Conclusion 



In this paper, we study the section citation network of the 
United States Code, the output of a complex and dynamic 
legislative process. We find that a power-law model is not 
a good fit for the degree distributions, though it is some- 
what plausible for the outdegree distribution. This result 
indicates that the common generative mechanisms for the 
power-law seen in other settings are not applicable. 

In order to better understand the citation dynamics in 
the Code, we evaluate an alternative hypothesis that the 
rate of citation is proportional to the amount of language 
within each section. Under this adjusted model, we find 
that the power-law is soundly rejected for both indegree 
and outdegree. Instead, the log-normal distribution is an 
excellent fit for both indegree and outdegree. This implies 
the unintuitive result that the logarithm of the number of 
citations per word follows a normal distribution. In sum- 
mary, our findings indicate that the power-law is not gen- 
erally applicable to citations in the United States Code 
but that the rate of citations per word obeys a log-normal 
distribution. 



